January 06, 2022


Background

Open Access Coronavirus Disease Epidemiological Data

Johns Hopkins University

The Center for Systems Science and Engineering (CSSE) at Johns Hopkins University provides a public, global COVID-19 Github repository (https://github.com/CSSEGISandData/COVID-19) with anonymous patient data aggregated from a number of sources.

We have built a centralised repository of individual-level information on patients with laboratory-confirmed COVID-19 (in China, confirmed by detection of virus nucleic acid at the City and Provincial Centers for Disease Control and Prevention), including their travel history, location (highest resolution available and corresponding latitude and longitude), symptoms, and reported onset dates, as well as confirmation dates and basic demographics. Information is collated from a variety of sources, including official reports from WHO, Ministries of Health, and Chinese local, provincial, and national health authorities. If additional data are available from reliable online reports, they are included. Data are available openly and are updated on a regular basis (around twice a day).

CSSE Data Sources (partial list):

The CSSE data are used for all global analyses in this document.

The New York Times

The New York Times has also provided public human coronavirus disease case and death data for the United States by county and by state. The U.S. data used for this analysis are pulled directly from The New York Times COVID-19 Github repository (https://github.com/nytimes/covid-19-data).

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Our World in Data

Hospital and ICU data collected from a number of official sources has been collated and maintained by Our World in Data. The the collated data and a complete list of country-by-country sources is available on GitHub (https://github.com/owid/covid-19-data).


Data Analysis

The COVID-19 data from both the John Hopkins and New York Times repositories are pulled and used to calculate the rate of new reported cases for each country and the rates of new reported cases and deaths for each U.S. state and county. These rates are used to generate a predictive regression model for each locale. A risk prediction (ρ) is generated from these models, and the countries, states, and counties with the highest predicted risk are compared in the charts in this document. In the U.S. case-death charts, a generalized additive model (GAM) smoothing function is fit to each data set to make it easier to visualize trends.

The risk assessment methodology used in this analysis has not been fully validated and is affected by noise in the data. There is a phenomenon that has been reported in White House press briefings in which some counties report updates on Mondays for the incremental changes over the weekend. Cyclical weekly variation can be observed in the data. This limits the accuracy of the model predictions. To increase prediction robustness, the model has been tuned to use data over a multi-day period as a compromise between the speed of the detection of a relevant changes in risk predictions and prediction error caused by sensitivity to noise.

The predictive analytics model is built with the open-source R programming language using the Tidyverse family of packages.




Summary Results

World

There are 196 countries represented in the Johns Hopkins University data set. The Gross Domestic Product (GDP) data shown above represents per capita GDP at purchasing power parity (PPP) in international (Geary-Khamis) dollars. These data are obtained from the Countries by GDP (PPP) per capita (Wikipedia) web page. Only countries with a risk prediction value above 25 are shown.




U.S.

There have been 58,107,145 total COVID-19 cases (704,369 new cases per day) and 831,541 deaths (2,113 new deaths per day) in the United States from January 21, 2020 to January 05, 2022.









Comparison with the European Union

The aggregated data from Johns Hopkins University CSSE was used to calculate a combined case rate for the 27 member states of the European Union (EU). The combined data were used to compare the pandemic response in the EU with the response in the U.S. over time. The rise in infections in the EU preceded the rise in the U.S. For time comparison, the 2,500th case recorded in the EU occurred on March 2, 2020. The 2,500th case in the U.S. was recorded on March 14, 2020. This comparison is minimally useful, however, because the populations of the two regions differ (U.S. - 328,239,523; EU - 447,206,135) and there are a number of other factors (e.g., population density, health care systems, prevalence of comorbidities) that are not consistent between the two.








Individual States

52 states currently have risk predictions above 25.






Counties

There are 3,221 U.S. counties represented in the New York Times data set.






Community Mobility Data

For the purpose of assisting the global COVID-19 pandemic response, Google has made available detailed mobility estimates relative to local baselines obtained from mobile phone and other data of the type used by traffic, etc., services like Google Maps and Waze. The data are provided by Google in the form of Community Mobility Reports.

As global communities respond to COVID-19, we’ve heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps could be helpful as they make critical decisions to combat COVID-19.

These Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.

The data used for the analysis below is current through January 03, 2022.




U.S.


Note: The dotted grey line on each of the mobility charts represents the date (March 13, 2020) on which the U.S. declared a National Emergency Concerning the Novel Coronavirus Disease (COVID-19) Outbreak.




Individual States